Negentropy Based Agglomerative Clustering for Microarray Data

نویسندگان

  • A. Ciaramella
  • A. Staiano
  • R. Tagliaferri
  • G. Longo
چکیده

Scientists have been successful in cataloguing genes through genome sequencing projects, and they can now generate vast quantities of gene expression data using microarrays. However, due to the sheer size of the data sets involved, and to complexity of the problems to be tackled, the biological comunity has so far had less success in understanding how genes and proteins are connected and how they operate within networks. Such challenges call for a novel approach to data mining and understanding heavily relying on artificial intelligence tools. In this work we propose a hierarchy of two unsupervised clustering algorithms to accomplish clustering of noisy data. The first algorithm is based on a competitive Neural Network or on a Probabilistic Principal Surfaces approach and the second one on an agglomerative clustering based on both Fisher and Negentropy information. In this way a hierarchical clustering algorithm is obtained. For this algorithm different definitions of Negentropy information could be used. We note that the only a priori information that we need is a dissimilarity threshold. From the several tests accomplished on complex data we noted that the approach has good performance. We also note that the dissimilarity threshold is able to understand the data set behavior. The method has been applied to microarray data set of cell cycle in yeast Saccharomyces Cerevisiae. We note that these are noisy gene expression data with missing data points. In this case the reliability of the clusters provided by our method is confirmed by the fact that clusters selected for the analysis contain genes with very low p-values.

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

A multi-step approach to time series analysis and gene expression clustering

MOTIVATION The huge growth in gene expression data calls for the implementation of automatic tools for data processing and interpretation. RESULTS We present a new and comprehensive machine learning data mining framework consisting in a non-linear PCA neural network for feature extraction, and probabilistic principal surfaces combined with an agglomerative approach based on Negentropy aimed a...

متن کامل

Clustering and visualization approaches for human cell cycle gene expression data analysis

In this work a comprehensive multi-step machine learning data mining and data visualization framework is introduced. The different steps of the approach are: preprocessing, clustering, and visualization. A preprocessing based on a Robust Principal Component Analysis Neural Network for feature extraction of unevenly sampled data is used. Then a Probabilistic Principal Surfaces approach combined ...

متن کامل

An Integrated Data Analysis Approach to Preprocessing, Visualization and Clustering of Microarray Data

Microarray technologies represent a powerful tool in biological research, but in order to attain their full potentialities, it is crucial to develop techniques to effectively exploit the huge quantity of data produced. We propose an innovative tool specifically tailored to perform preprocessing, visualization and clustering on this type of data. The improvements with respect to more traditional...

متن کامل

Data Complexity in Clustering Analysis of Gene Microarray Expression Profiles

The increasing application of microarray technology is generating large amounts of high dimensional gene expression data. Genes participating in the same biological process tend to have similar expression patterns, and clustering is one of the most useful and efficient methods for identifying these patterns. Due to the complexity of microarray profiles, there are some limitations in directly ap...

متن کامل

Ensemble clustering method based on the resampling similarity measure for gene expression data.

The rapid development of microarray technologies enabled the monitoring of expression levels of thousands of genes simultaneously. Microarray technology has great potential for creating an enormous amount of data in a short time, and now becomes a new tool for studying such broad problems as classification of tumors in biology and medical science. Many statistical methods are available for anal...

متن کامل

ذخیره در منابع من


  با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

عنوان ژورنال:

دوره   شماره 

صفحات  -

تاریخ انتشار 2005